Sifting data in the real world
نویسنده
چکیده
In the real world, experimental data are rarely, if ever, distributed as a normal (Gaussian) distribution. As an example, a large set of data—such as the cross sections for particle scattering as a function of energy contained in the archives of the Particle Data Group[1]—is a compendium of all published data, and hence, unscreened. Inspection of similar data sets quickly shows that, for many reasons, these data sets have many outliers—points well beyond what is expected from a normal distribution—thus ruling out the use of conventional χ techniques. This note suggests an adaptive algorithm that allows a phenomenologist to apply to the data sample a sieve whose mesh is coarse enough to let the background fall through, but fine enough to retain the preponderance of the signal, thus sifting the data. A prescription is given for finding a robust estimate of the best-fit model parameters in the presence of a noisy background, together with a robust estimate of the model parameter errors, as well as a determination of the goodness-of-fit of the data to the theoretical hypothesis. Extensive computer simulations are carried out to test the algorithm for both its accuracy and stability under varying background conditions.
منابع مشابه
The “sieve” Algorithm—sifting Data in the Real World
Experimental data are rarely, if ever, distributed as a normal (Gaussian) distribution, in real world applications. A large set of data—such as the cross sections for particle scattering as a function of energy contained in the archives of the Particle Data Group1—is a compendium of all published data, and hence, unscreened. For many reasons, these data sets have many outliers—points well beyon...
متن کاملOn the Hilbert-Huang Transform Theoretical Developments
One of the main heritage tools used in scientific and engineering data spectrum analysis is the Fourier Integral Transform and its high performance digital equivalent the Fast Fourier Transform (FFT). Both carry strong a-priori assumptions about the source data, such as linearity, of being stationary, and of satisfjmg the Dirichlet conditions. A recent development at the National Aeronautics an...
متن کاملA Navigation System for Personalized Databases: "StarMap"
Abstract In light of the proliferation of the World Wide Web (web), the future of large database management lies in addressing the problems of sifting through structured data sources (e.g., data warehouses) with the same tools that are used for sifting through unstructured sources (e.g., web). Doing so will enable faster and more specific decision making as well as facilitate reusable decision ...
متن کاملDesign and Test of the Real-time Text mining dashboard for Twitter
One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...
متن کاملMINCE: A Static Global Variable-Ordering for SAT Search and BDD Manipulation
The increasing popularity of SAT and BDD techniques in formal hardware verification and automated synthesis of logic circuits encourages the search for additional speed-ups. Since typical SAT and BDD algorithms are exponential in the worst-case, the structure of real-world instances is a natural source of improvements. While SAT and BDD techniques are often presented as mutually exclusive alter...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005